Space-Efficient Whole Genome Comparisons with BurrowsWheeler Transforms
نویسنده
چکیده
The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time-efficient, O(n), data structures for this computation, such as the suffix tree, require O(n log(n)) space, several times the space of the genomes themselves. Thus, any reasonable whole-genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time-efficiency. This is beyond most modern workstations. With a new data structure, the compressed suffix array (CSA) implemented via the Burrows-Wheeler transform, we can trade time-efficiency for space-efficiency, taking O(n log(n)) time, but running in O(n) space, typically in total space less than or equal to that of the genomes themselves. If space is more expensive than time, this is an appropriate approach to consider. The most space-efficient implementation of this data structure requires 5 bits per nucleotide character to build on-line, in the worst case, and 2.5 bits per character to store once built. We present a description of this data structure and how it is used to obtain matches. An implementation (called bbbwt) is demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures.
منابع مشابه
Solution to time fractional generalized KdV of order 2q+1 and system of space fractional PDEs
Abstract. In this work, it has been shown that the combined use of exponential operators and integral transforms provides a powerful tool to solve time fractional generalized KdV of order 2q+1 and certain fractional PDEs. It is shown that exponential operators are an effective method for solving certain fractional linear equations with non-constant coefficients. It may be concluded that the com...
متن کاملComparison of whole genome assemblies of the human genome.
A fundamental problem in the human genome project is uncovering the correct assembly of the human genome. Many studies, including transcriptional analysis, SNP detection and characterization, gene finding and EST clustering, use genome assemblies as templates so it is important to determine the consistency among the various whole genome assemblies. A comparison of the order and orientation of t...
متن کاملPredicting CpG Islands and DNA Methlation in the Cow Genome Using DNA Microarray Meta-Analysis and Genome Wide Scanning
DNA methylation is a type of epigenetic changes that directly affects DNA. In mammals, DNA methylation is essential for fetal development and stem cell differentiation and this phenomenon essentially occurs within the CpG islands. In this study, two methods were used to study the DNA methylation profile of cow genome. In the first method, the DNA methylation profile of the differentially expres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 12 4 شماره
صفحات -
تاریخ انتشار 2005